Experience with building a commodity Intel-based ccNUMA system
نویسندگان
چکیده
Commercial Cache-Coherent Non-Uniform Memory Access (ccNUMA) systems often require extensive investments in hardware design and operating system support. A different approach to building these systems is to use Standard High Volume (SHV) hardware and stock software components as building blocks and assemble them together with minimal investments in hardware and software. This design approach trades the performance advantages of specialized hardware design for simplicity and implementation speed., and relies on application-level tuning for scalability and performance. We present our experience with this approach in this article.
منابع مشابه
Proceedings of the 3 rd USENIX Windows NT Symposium
We have built a 16-way, ccNUMA multiprocessor prototype to study the feasibility of building large scale servers out of Standard High Volume (SHV) components. Using a cache-coherent interconnect, our prototype combines four 4-processor SMPs built using 350MHz Intel Xeon processors, yielding a 16-way system with a total of 4 GBytes of physical memory distributed over the nodes. Such an environme...
متن کاملMulti-Threading Performance on Commodity Multi-Core Processors
Multi-core processors based commodity servers recently become building blocks for high performance computing Linux clusters. The multi-core processors deliver better performance-to-cost ratios relative to their single-core predecessors through on-chip multi-threading. However, they present challenges in developing high performance multi-threaded code. In this paper we study the performance of d...
متن کاملAccessing Data on SGI Altix: An Experience with Reality
The SGI Altix system architecture allows to support very large ccNUMA shared memory systems. Nevertheless, the system layout sets boundaries to the sustained memory performance which can only be avoided by selecting the “right” data access strategies. The paper presents the results of cache and memory performance studies on SGI Altix 350. It demonstrates limitations and benefits of the system a...
متن کاملOptimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems
Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (TBB) components, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance bottlenecks if locality of data access is important, which is typically the case for memory-bound code on ccNUMA systems. We present a thin software layer amel...
متن کاملImplementing Transparent Shared Memory on Clusters Using Virtual Machines
Shared memory systems, such as SMP and ccNUMA topologies, simplify programming and administration. On the other hand, clusters of individual workstations are commonly used due to cost and scalability considerations. We have developed a virtual-machine-based solution, dubbed vNUMA, that seeks to provide a NUMA-like environment on a commodity cluster, with a single operating system instance and t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IBM Journal of Research and Development
دوره 45 شماره
صفحات -
تاریخ انتشار 2001